A  Comparative  Analysis  of  Face  Recognition  Performance 
with  Visible  and  Thermal  Infrared  Imagery* 


Diego  A.  Socolinskyf  Andrea  SelingerJ 


f  Equinox  Corporation 
207  East  Redwood  Street 
Baltimore,  MD  21202 


JEquinox  Corporation 
9  West  57th  Street 
New  York,  NY  10019 


{diego, andreaj@equinoxsensors.com 


Abstract 

We  present  a  comprehensive  performance  analysis  of  mul¬ 
tiple  appearance-based  face  recognition  methodologies,  on 
visible  and  thermal  infrared  imagery.  We  compare  algo¬ 
rithms  within  and  between  modalities  in  terms  of  recog¬ 
nition  performance,  false  alarm  rates  and  requirements  to 
achieve  specified  performance  levels.  The  effect  of  illu¬ 
mination  conditions  on  recognition  performance  is  empha¬ 
sized,  as  it  underlines  the  relative  advantage  of  radiometri- 
cally  calibrated  thermal  imagery  for  face  recognition. 

1  Introduction 

Face  recognition  in  the  thermal  infrared  domain  has  re¬ 
ceived  relatively  little  attention  in  the  literature  in  compar¬ 
ison  with  recognition  in  visible-spectrum  imagery.  Orig¬ 
inal  tentative  analyses  have  focused  mostly  on  validating 
thermal  imagery  of  faces  as  a  valid  biometric  [1,  2].  The 
lower  interest  level  in  infrared  imagery  has  been  based  in 
part  on  the  following  factors:  much  higher  cost  of  thermal 
sensors  versus  visible  video  equipment,  lower  image  res¬ 
olution,  higher  image  noise,  and  lack  of  widely  available 
data  sets.  These  historical  objections  are  becoming  less  rel¬ 
evant  as  infrared  imaging  technology  advances,  making  it 
attractive  to  consider  thermal  sensors  in  the  context  of  face 
recognition.  In  the  current  study,  we  focus  our  attention  on 
longwave  infrared  (LW1R)  imagery,  in  the  spectral  range  of 
8/r-12/i.  Other  regions  of  the  infrared  spectrum  also  hold 
promise,  and  will  be  considered  in  upcoming  work. 

The  influence  of  varying  ambient  illumination  on  sys¬ 
tems  using  visible  imagery  is  well-known  to  be  one  of  the 
major  limiting  factors  for  recognition  performance  [2,  3],  A 
variety  of  methods  for  compensating  for  variation  in  illu- 
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mination  have  been  studied  in  order  to  boost  recognition 
performance,  including  histogram  equalization,  laplacian 
transforms,  gabor  transforms,  logarithmic  transforms,  and 
3-D  shape-based  methods.  These  techniques  aim  at  reduc¬ 
ing  the  within-class  variability  introduced  by  changes  in  il¬ 
lumination,  which  has  been  shown  to  be  often  larger  than 
the  between-class  variability  in  the  data,  thus  severely  af¬ 
fecting  classification  performance. 

Thermal  infrared  imagery  of  faces  is  nearly  invariant  to 
changes  in  ambient  illumination  [4],  Consequently,  no  com¬ 
pensation  is  necessary,  and  within-class  variability  is  sig¬ 
nificantly  lower  than  that  observed  in  visible  imagery.  As 
a  matter  of  fact,  it  is  well-known  that  under  the  assump¬ 
tion  of  Lambertian  reflection,  the  set  images  of  a  given 
face  acquired  under  all  possible  illumination  conditions  is 
a  subspace  of  the  vector  space  of  images  of  fixed  dimen¬ 
sions.  In  sharp  contrast  to  this,  the  set  of  LWIR  images  of 
a  face  under  all  possible  imaging  conditions  is  contained  in 
a  bounded  set.  It  follows  that  under  general  conditions  we 
can  expect  lower  within-class  variation  for  LWIR  images  of 
faces  than  their  visible  counterpart.  It  remains  to  demon¬ 
strate  that  there  is  sufficient  between-class  variability  to  en¬ 
sure  high  discrimination. 

Previous  work  by  the  authors  provides  a  starting  point  for 
the  current  analysis.  In  [5],  the  authors  perform  a  compar¬ 
ison  of  recognition  performance  between  visible  and  long¬ 
wave  infrared  imagery,  based  on  two  standard  appearance- 
based  algorithms:  Eigenfaces  and  ARENA.  The  prelimi¬ 
nary  nature  of  that  study  limited  the  performance  analy¬ 
sis  to  top-match  recognition  rates  on  various  scenarios  ob¬ 
tained  by  varying  the  training  and  testing  sets,  in  a  fash¬ 
ion  reminiscent  of  n-fold  cross-validation.  No  mention  is 
made  of  false-alarm  rates,  receiver-operating-characteristic 
(ROC)  curves  or  peformance-versus-rank  curves. 

The  current  work  builds  on  our  previous  research  and 
expands  to  cover  those  areas  not  touched-upon  therein.  In 
addition,  we  provide  a  much  broader  comparison  including 
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Figure  1 :  Camera  and  lighting  setup  for  data  collection. 


several  other  appearance-based  face  recognition  algorithms 
based  on  more  sophisticated  representations,  better  approx¬ 
imating  the  state-of-the-art  in  the  field.  While  still  within 
the  limitations  imposed  by  existing  data  sets,  we  feel  that 
the  analysis  below  provides  a  firm  basis  for  evaluation  of 
thermal  imagery  as  a  valid  biometric  identification  tool. 

2  Data  Collection  and  Calibration 


quences  were  acquired  with  three  illumination  conditions: 
frontal,  left  lateral  and  right  lateral.  For  each  subject  and 
illumination  condition,  a  40  frame,  four  second,  image  se¬ 
quence  was  recorded  while  the  subject  pronounced  the  vow¬ 
els  looking  towards  the  camera.  After  the  initial  40  frames, 
three  static  shots  were  taken  while  the  subject  was  asked 
to  act  out  the  expressions  ‘smile’,  ‘frown’,  and  ‘surprise’. 
In  addition,  for  those  subjects  who  wore  glasses,  the  entire 
process  was  done  with  and  without  glasses.  Figure  2  shows 
a  sampling  of  the  data  in  both  modalities. 


All  data  used  to  obtain  the  results  below  was  acquired  with  a 
newly  developed  sensor  capable  of  capturing  simultaneous 
coregistered  video  sequences  with  a  visible  CCD  array  and 
LWIR  microbolometer.  This  is  of  particular  significance  for 
our  tests,  since  it  allows  performance  comparison  on  pre¬ 
cisely  the  same  imagery,  much  like  using  the  red  and  blue 
channels  of  a  color  image. 

We  collected  the  data  during  a  two-day  period  at  the  Na¬ 
tional  Institute  of  Standards  and  Technology  (NIST).  The 
format  consists  of  240x320  pixel  image  pairs,  co-registered 
to  within  1/3  pixel,  where  the  visible  image  has  8  bits  of 
grayscale  resolution  and  the  LWIR  has  12  bits. 

All  of  the  LWIR  imagery  was  radiometrically  calibrated. 
Since  the  responsivity  of  LWIR  sensors  is  very  linear,  the 
pixelwise  relation  between  grayvalues  and  radiant  power 
can  be  computed  by  a  process  of  two-point  calibration.  Im¬ 
ages  of  a  black-body  radiator  covering  the  entire  field  of 
view  are  taken  at  two  known  temperatures,  and  thus  the 
gains  and  offsets  are  computed  using  the  radiant  power  for 
a  black-body  at  a  given  temperature.  A  complete  expla¬ 
nation  of  the  process  can  be  found  in  [5],  but  we  should 
note  here  that  radiometic  calibration  of  thermal  images  re¬ 
moves  extrinsic  variations  due  to  sensor  and  environmental 
factors,  yielding  a  physically  meaningful  measurement  of 
the  scene’s  radiance. 

2.1  The  Collection  Setup 

For  the  collection  of  our  images,  we  used  the  FBI  mugshot 
standard  light  arrangement,  shown  in  Figure  1.  Image  se¬ 


Figure  2:  Sample  imagery  from  our  data  collection. 

A  total  of  115  subjects  were  imaged  during  a  two-day 
period.  After  removing  corrupted  imagery  from  24  sub¬ 
jects,  our  test  database  consists  of  over  25, 000  frames  from 
91  distinct  subjects.  Much  of  the  data  is  highly  correlated, 
so  only  specific  portions  of  the  database  can  be  used  for 
training  and  testing  purposes  without  creating  unrealisti¬ 
cally  simple  recognition  scenarios.  This  is  explained  in  Sec¬ 
tion  3.  The  entire  image  collection  used  for  the  experiments 
below  is  available  at  the  authors’  website1. 

3  Testing  Methodology 

Following  the  approach  in  [5],  we  selected  subsets  of  our 
face  database  to  be  used  as  testing  and  training  sets.  In  n- 
fold  cross-validation  experiments,  one  repeatedly  selects  a 
random  subset  of  the  available  data  as  a  training  set,  and 
testing  is  performed  on  the  remaining  data.  Repeating  this 
process  multiple  times  and  reporting  mean  performance 
yields  statistically  significant  results.  We  are  particularly 
interested  in  exposing  the  relation  between  illumination,  as 
well  as  facial  expression,  variation  and  recognition  pefor- 
mance.  Therefore,  we  chose  our  training/testing  pairs  in  a 
biased  fashion  rather  than  randomly,  in  order  to  elicit  the  de¬ 
sired  information.  Note  that  based  on  the  choices  below,  our 
testing  methodology  is  stricter,  and  should  produce  lower 
average  results  than  random  cross-validation.  Additionally, 

1  http://www.equinoxsensors.com/hid 


since  much  of  our  data  is  highly  correlated  due  to  the  acqui¬ 
sition  procedure,  the  biased  choices  below  help  decorrelate 
testing  and  training  sets. 

We  construct  multiple  query  sets  for  testing  and  train¬ 
ing.  Frames  0,  3  and  9  from  a  given  image  sequence  are  re¬ 
ferred  to  as  vowel  frames.  Frames  corresponding  to  ‘smile’, 
‘frown’  and  ‘surprise’  are  referred  to  as  expression  frames. 
Our  query  criteria  are  as  follows: 

VA:  Vowel  frames,  all  subjects,  all  illuminations. 

EA:  Expression  frames,  all  subjects,  all  illuminations. 

VF:  Vowel  frames,  all  subjects,  frontal  illumination. 

EF:  Expression  frames,  all  subjects,  frontal  illumination. 
VL:  Vowel  frames,  all  subjects,  lateral  illumination. 

EL:  Expression  frames,  all  subjects,  lateral  illumination. 
VG:  Vowel  frames,  subjects  wearing  glasses,  all  illumina¬ 
tions. 

EG:  Expression  frames,  subjects  wearing  glasses,  all 
illuminations. 

RR:  500  random  frames,  arbitrary  illumination. 

The  same  queries  were  used  to  construct  sets  for  visi¬ 
ble  and  LWIR  imagery,  and  all  LWIR  images  were  radio- 
metrically  calibrated.  Locations  of  the  eyes  and  the  frenu¬ 
lum  were  semi-automatic  ally  located  in  all  visible  images, 
which  also  provided  the  corresponding  locations  in  the  co¬ 
registered  LWIR  frames.  Using  these  feature  locations,  all 
images  were  geometrically  transformed  to  a  common  stan¬ 
dard,  and  cropped  to  eliminate  all  but  the  inner  face.  Query 
set  RR  is  used  to  compute  all  relevant  subspaces  and  basis 
sets  for  the  algorithms  below,  unless  otherwise  noted.  Ad¬ 
ditionally,  some  testing/training  combinations  are  omitted 
from  the  tables  due  to  inclusion  relations. 

Tabular  performance  results  reported  below  are  for  the 
top  match.  We  also  report,  in  graphical  form,  recognition 
performance  as  a  function  of  rank.  In  this  case,  for  a  fixed 
rank  k  >  1,  a  probe  is  considered  correctly  classified  if  any 
of  the  top  k  matches  are  correct.  Note  that  this  is  not  the 
same  as  a  k -nearest-neighbor  classifier. 

When  reviewing  rank-ordered  match  results,  in  addition 
to  the  rate  of  correct  recognition,  we  must  also  consider  the 
false-alarm  rate  incurred  by  relaxing  our  correctness  crite¬ 
rion.  Let  T  be  a  training  set  and  V  a  set  of  probes.  For 
p  £  V,  let  Mp  be  the  distance  from  p  to  the  fcth  closest  train¬ 
ing  observation,  and  Hp  =  {t  €  T  |  dist(p,  t)  <  M*}. 
Define  ap  to  be  1  if  any  member  of  Hp  belongs  to  the  same 
class  as  p,  and  zero  otherwise.  Further  define  \\Hp\\  to  be 
the  number  of  distinct  class  labels  among  elements  of  H ^ 
and  |j'P||  the  number  of  probes  in  V.  With  this  notation,  the 
correct  classification  rate  and  false  alarm  rate  are  respec¬ 
tively  given  by 

4  mfe,  PJ  *  \\n^v  ii^ii  ' 


4  Algorithms  Tested 

The  testing  methodology  outlined  above  was  applied  to  sev¬ 
eral  appearance-based  algorithms.  We  should  point  out  that 
the  restriction  to  appearance-based  techniques  was  moti¬ 
vated  by  the  fact  that  geometry-based  methods  depend  only 
on  the  ability  to  accurately  locate  facial  landmarks  in  the 
image.  While  such  landmarks  may  be  more  easily  located 
in  one  modality  over  the  other,  the  effect  of  the  imaging 
modality  on  the  final  recognition  outcome  is  indirect,  and 
thus  an  analysis  of  that  effect  would  be  less  revealing.  In 
addition,  appearance-based  methods  have  generally  shown 
higher  performance  than  those  based  on  facial  geometry 
alone. 

All  algorithms  tested  consist  of  a  projection  to  a  sub¬ 
space  of  the  image  space  followed  by  1-nearest  neigh¬ 
bor  classification.  The  different  subspace  constructions  are 
briefly  outlined  below.  For  complete  details  see  [6].  Digi¬ 
tal  images  are  converted  into  vectors  by  scanning  in  raster 
order. 

4.1  Eigenfaces  (PCA) 

This  is  perhaps  the  most  popular  algorithm  in  the  field  [7]. 
The  face  space  is  computed  by  taking  a  (usually  separate) 
set  of  training  observations,  and  finding  the  unique  ordered 
orthonormal  basis  of  the  data  space  that  diagonalizes  the  co- 
variance  matrix  of  those  observations,  ordered  by  the  vari¬ 
ances  along  the  corresponding  one-dimensional  subspaces. 
These  vectors  are  known  as  principal  components,  or  eigen- 
faces.  It  is  well-known  that,  for  a  fixed  choice  of  n,  the  sub¬ 
space  spanned  by  the  first  n  basis  vectors  is  the  one  with 
lowest  L 2  reconstruction  error  for  any  vector  in  the  train¬ 
ing  set  used  to  create  the  face  space.  Under  the  assumption 
that  the  training  set  is  representative  of  all  face  images,  the 
face  space  is  taken  to  be  a  good  low-dimensional  approxi¬ 
mation  to  the  set  of  all  possible  face  images  under  varying 
conditions. 

4.2  Linear  Discriminant  Analysis  (LDA) 

It  is  a  classical  result  that  while  the  feature  subspace  used 
by  Eigenfaces,  obtained  through  principal  component  anal¬ 
ysis,  is  optimal  in  terms  of  L2  reconstruction  error,  it  has 
no  optimality  properties  in  terms  of  class  discriminabil- 
ity.  In  fact,  class  membership  is  not  taken  into  account 
in  the  construction  of  the  face  space.  Under  the  assump¬ 
tion  of  homoscedastic  gaussianly  distributed  classes  and  lin¬ 
ear  separability,  one  can  show  that  the  optimal  subspace  in 
which  to  perform  classification  is  spanned  by  the  solution 
vectors  w  of  the  following  generalized  eigenvalue  problem 
SbW  =  XSW  w,  where  Sw  and  Sb  are  the  within-class  and 
between-class  scatter  matrices,  respectively.  This  gives  rise 


to  the  algorithm  popularized  as  Fisherfaces  [8],  We  con¬ 
sider  two  slight  variants,  referred  to  below  as  LDAg  and 
LDAt,  details  on  the  differences  may  be  found  in  [6], 

4.3  Local  Feature  Analysis  (LFA) 

Another  subspace  representation  for  facial  data  based  on 
second  order  statistics  results  by  enforcing  topographic  in¬ 
dexing  of  the  basis  vectors,  and  minimizing  their  correla¬ 
tion.  Local  Feature  Analysis  [9]  achieves  this  by  construct¬ 
ing  a  family  of  feature  detectors  based  on  a  PCA  decompo¬ 
sition,  which  are  locally  correlated.  A  selection,  or  sparsifi- 
cation,  step  is  then  used  to  produce  a  minimally  correlated 
subset  of  features,  which  define  the  subspace  of  interest. 
While  the  original  method  is  geared  at  optimal  reconstruc¬ 
tion,  sparsification  techniques  consistent  with  the  require¬ 
ments  of  a  recognition  system  are  also  possible.  We  use 
two  subselection  methods,  one  following  [10]  and  the  other 
explained  in  detail  in  [6],  referred  to  below  as  LFAb  and 
LFAe,  respectively. 

4.4  Independent  Component  Analysis  (ICA) 

Principal  component  analysis  seeks  an  orthonormal  basis 
for  the  data  space  with  respect  to  which  the  marginal  train¬ 
ing  distributions  are  uncorrelated.  Independent  component 
analysis  goes  farther  by  requiring  a  basis  (not  orthogonal) 
such  that  the  corresponding  marginals  are  statistically  in¬ 
dependent.  Note  that  these  conditions  are  equivalent  if  the 
data  is  globally  Gaussian,  but  that  is  hardly  ever  the  case  in 
practice.  Computation  of  the  independent  components  can¬ 
not  be  done  by  solving  an  algebraic  system  of  equations, 
and  rather  must  be  done  by  numerically  minimizing  a  crite¬ 
rion  function.  Different  criterion  functions  exist,  based  on 
kurtosis  or  other  higher  order  moments,  mutual  information 
between  marginals  or  entropy  criteria,  all  yielding  compa¬ 
rable  results  for  our  application.  We  used  the  FastICA  algo¬ 
rithm  described  in  [11], 

5  Experimental  Results  and  Discus¬ 
sion 

Images  were  subsampled  by  a  factor  of  10  in  each  dimen¬ 
sion  prior  to  experimentation.  Visible  images  were  de¬ 
meaned  and  normalized  to  unit  norm  in  order  to  provide 
some  measure  of  illumination  compensation.  Thermal  im¬ 
ages  were  processed  via  two-point  radiometric  calibration. 
Subspaces  for  PCA,  LFA  and  ICA  were  chosen  to  be  100- 
dimensional,  and  the  LDA  subspaces  have  as  many  dimen¬ 
sions  as  classes  in  the  training  set,  minus  one. 

For  each  valid  pair  of  training  and  testing  sets,  we  com¬ 
puted  the  top-match  recognition  performance,  and  reported 


it  below  in  Tables  1,  2,  3,  and  4.  Each  column  in  a  given 
table  corresponds  to  a  training  set,  and  each  row  to  a  testing 
set.  Visible  results  are  reported  above  the  corresponding 
LWIR  results.  Note  that,  over  all  experiments  performed, 
results  on  visible  imagery  are  always  inferior  to  those  on 
LWIR  imagery.  This  is  not  only  the  case  for  testing/training 
pairs  where  the  illumination  conditions  are  different,  but  in¬ 
deed  holds  even  for  those  pairs  where  we  have  no  intuitive 
reason  to  expect  performance  on  LWIR  to  be  superior. 

Recognition  performance  on  visible  imagery,  regardless 
of  algorithm,  is  worst  for  pairs  where  both  illumination  and 
facial  expression  vary  between  the  training  and  testing  sets, 
followed  by  pairs  where  either  illumination  or  expression 
differ.  Note  that  due  to  the  reflective  nature  of  visible  light 
imaging,  a  change  in  facial  expression  implies  a  change 
in  shading  (even  in  uniform  areas  of  the  face)  as  a  result 
of  varying  surface  normals.  Worst  performance  for  LWIR 
recognition  occurs  for  similar  condition  pairs.  We  should 
briefly  mention  that  the  best  improvement  between  algo¬ 
rithms  on  visible  imagery  occurs  also  for  these  challenging 
pairs,  indicating  that  more  powerful  representational  meth¬ 
ods  are  better  able  to  reject  features  with  poor  classification 
potential. 

Table  5  shows  mean,  minimum  and  maximum  perfor¬ 
mances  for  each  algorithm  over  the  multiple  experiments  in 
Tables  1,  2,  3,  and  4.  Mean  results  are  weighted  accord¬ 
ing  to  the  number  of  images  in  each  testing  set.  The  most 
notable  property  of  these  results  is  that  recognition  perfor¬ 
mance  is  always  better  with  LWIR  over  visible  imagery. 
Average  error  is  reduced  anywhere  from  47%  to  83%,  de¬ 
pending  on  the  algorithm.  Similar  improvement  is  seen  for 
the  worst-  and  best-case  results.  An  additional  measure  of 
relative  accuracy  and  stability  of  recognition  results  in  the 
visible  versus  LWIR  is  given  by  the  average  ratio  of  worst  to 
mean  performance.  For  visible  imagery  we  have  a  ratio  of 
0.719,  while  for  LWIR  we  have  0.936,  which  indicates  that 
LWIR  recognition  is  both  more  accurate  and  more  stable. 

Figure  3  shows  representative  receiver-operating- 
characteristic  curves  for  each  algorithm  and  both  modali¬ 
ties.  We  can  see  that  LWIR  imagery  is  superior  not  only  in 
terms  of  correct  classification,  but  also  in  terms  of  lower 
false  alarm  rates.  In  fact,  in  order  to  obtain  recognition 
performance  with  visible  imagery  comparable  to  top-match 
performance  in  LWIR,  one  must  be  willing  to  accept 
untenable  false-alarm  levels.  Figure  4  shows  representative 
plots  of  performance  as  a  function  of  rank-ordered  result. 
Once  again,  we  see  that  top-match  performance  in  the 
LWIR  is  comparable  to  that  obtained  with  visible  imagery 
when  considering  the  top  10-50  matches.  A  more  thorough 
analysis  of  these  phenomena  can  be  found  in  [6], 


Figure 


3:  ROC  curves  for  Eigenfaces,  LDA,  LFAb  and  ICA,  respectively. 


Figure  4:  Performance-vs-rank  curves  for  Eigenfaces,  FDA,  FFAb  and  ICA,  respectively. 
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Table  1:  Eigenfaces  performance. 
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Table  3:  FFAb  performance. 
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/0.897\ 

/0.862\ 

VO. 937/ 

VO. 885/ 

VO. 922/ 

VO. 930/ 

VG 

/0.926\ 

/0.949\ 

/0.841\ 

/0.988\ 

/0.898\ 

VO. 956/ 

VO. 993/ 

VO. 910/ 

VI. 000/ 

VO. 949/ 
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Table  2:  FDAt  performance. 


Table  4:  ICA  performance. 


Visible 

LWIR 

Error 

Reduction  % 

PC  A 

0.73/0.36/0.97 

0.95/0.89/  1.00 

83/83/100 

LDAg 

0.93/0.85  /0.99 

0.97/0.92/  1.00 

57/47/100 

LDAt 

0.92/0.76/  1.00 

0.98/0.94/  1.00 

77/74/0 

LFAe 

0.82/0.62/0.97 

0.93/0.84/0.99 

61/59/92 

LFAb 

0.85/0.63/0.98 

0.93/0.83/0.99 

47/53/73 

ICA 

0.88/0.72/0.99 

0.94/0.86/  1.00 

49/50/100 

Table  5:  Weighted  mean,  minimum  and  maximum  perfor¬ 
mance  on  each  modality,  plus  percentual  reduction  of  error 
from  visible  to  LWIR. 

6  Conclusions 

We  performed  a  comprehensive  comparison  of  classical 
and  state-of-the-art  appearance-based  face  recognition  al¬ 
gorithms  applied  to  visible  and  LWIR  imagery.  Building  on 
previous  work,  we  emphasized  the  role  of  varying  the  train¬ 
ing  and  testing  sets,  as  a  tool  to  uncover  strengths  and  weak¬ 
nesses  of  algorithms  and  imaging  modalities.  Confounding 
variation  in  imaging  conditions  were  minimized  by  collect¬ 
ing  data  with  an  innovative  sensor  capable  of  simultaneous 
coregistered  acquisition  of  both  modalities. 

It  becomes  clear  from  our  analysis,  that  LWIR  imagery 
of  human  faces  is  not  only  a  valid  biometric,  but  almost 
surely  a  superior  one  to  comparable  visible  imagery.  This 
conclusion  must  be  tempered  somewhat  by  the  fact  that 
while  our  data  collection  includes  many  challenging  situ¬ 
ations  for  visible  recognition  algorithms,  it  may  not  contain 
sufficiently  challenging  ones  for  LWIR  recognition.  Unfor¬ 
tunately,  collecting  such  challenging  imagery  is  costly  and 
complicated,  since  we  must  introduce  variation  due  to  ambi¬ 
ent  temperature,  wind,  and  metabolic  processes  in  the  sub¬ 
ject.  Nonetheless,  such  data  collection  is  currently  under¬ 
way,  and  experimental  results  will  be  reported  elsewhere. 
As  noted  in  [5],  while  our  current  working  database  may 
not  include  the  most  challenging  scenarios  for  LWIR  face 
recognition,  it  is  representative  of  uncontrolled  indoor  im¬ 
agery,  and  thus  our  results  are  very  encouraging  in  that  con¬ 
text. 

Ongoing  and  future  work  includes  analysis  on  more  chal¬ 
lenging  LWIR  imagery,  improved  calibration  methods  to 
further  reduce  environmental  distractors,  and  most  impor¬ 
tantly  fusion  of  both  modalities.  Preliminary  results  on  fu¬ 
sion  of  modalities  are  extremely  promising,  indicating  that 
a  further  reduction  of  error  of  50%  over  LWIR  performance 
may  be  possible. 
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